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Abstract. We implement a new algorithm for listing all maximal cliques in 
sparse graphs due to Eppstein, Loftier, and Strash (ISAAC 2010) and analyze its 
performance on a large corpus of real-world graphs. Our analysis shows that this 
algorithm is the first to offer a practical solution to listing all maximal cliques in 
large sparse graphs. All other theoretically-fast algorithms for sparse graphs have 
been shown to be significantly slower than the algorithm of Tomita et al. (The- 
oretical Computer Science, 2006) in practice. However, the algorithm of Tomita 
et al. uses an adjacency matrix, which requires too much space for large sparse 
graphs. Our new algorithm opens the door for fast analysis of large sparse graphs 
whose adjacency matrix will not fit into working memory. 

Keywords: maximal clique listing, Bron-Kerbosch algorithm, sparse graphs, d- 
degenerate graphs 

1 Introduction 

Clique finding procedures arise in the solutions to a wide variety of important appli- 
cation problems. The problem of finding cliques was first studied in social network 
analysis, as a way of finding closely-interacting communities of agents in a social 
network [ 19 1. In bioinformatics, clique finding procedures have been used to find fre- 
quently occurring patterns in protein structures [17 26 27 1, to predict the structures of 
proteins from their molecular sequences [43 1, and to find similarities in shapes that may 
indicate functional relationships between proteins lfl4l . Other applications of clique 
finding problems include information retrieval 0, computer vision [20|, computational 
topology l50l . and e-commerce ll49l . 

For many applications, we do not want to report one large clique, but all maximal 
cliques. Any algorithm which solves this problem must take exponential time in the 
worst-case because graphs can contain an exponential number of cliques 1 37 1 . However, 
graphs with this worst-case behavior are not typically encountered in practice. More 
than likely, the types of graphs that we will encounter are sparse ifTBI . Therefore, the 
feasibility of clique listing algorithms lies in their ability to appropriately handle sparse 
input graphs. Indeed, it has long been known that certain sparse graph families, such as 
planar graphs and graphs with low arboricity, contain only a linear number of cliques, 
and that all maximal cliques in these graphs can be listed in linear time 11 1 01 1 II . In 
addition, there are also several methods to list all cliques in time polynomial in the 
number of cliques reported l46l . which can be done faster if parameterized on a sparsity 
measure such as maximum degree l36l . 



Many different clique-finding algorithms have been implemented, and an algorithm 
of Tomita et al. |45 ], based on the much earlier Bron-Kerbosch algorithm |8 1, has been 
shown through many experiments to be faster by orders of magnitude in practice than 
others. An unfortunate drawback of the algorithm of Tomita et al., however, is that both 
its theoretical analysis and implementation rely on the use of an adjacency matrix rep- 
resentation of the input graph. For this reason, their algorithm has limited applicability 
for large sparse graphs, whose adjacency matrix may not fit into working memory. We 
therefore seek to have the best of both worlds: we would ideally like an algorithm that 
rivals the speed of the Tomita et al. result, while having linear storage cost. 

Recently, together with Maarten Loffler, the authors developed and published a new 
algorithm for listing maximal cliques, particularly optimized for the case that the in- 
put graph is sparse [ 1 3 1 . This new algorithm combines features of both the algorithm 
of Tomita et al. and the earlier Bron-Kerbosch algorithm on which it was based, and 
maintains through its recursive calls a dynamic graph data structure representing the 
adjacencies between the vertices that remain relevant within each call. When analyzed 
using parameterized complexity in terms of the degeneracy of the input graph (a mea- 
sure of its sparsity) its running time is near-optimal in terms of the worst-case number 
of cliques that a graph with the same sparsity could have. However, the previous work 
of the authors with Loffler did not include any implementation or experimental results 
showing the algorithm to be good in practice as well as in theory. 

1.1 Our Results 

We implement the algorithm of Eppstein, Loffler, and Strash for listing all maximal 
cliques in sparse graphs [13]. Using a corpus of many large real- world graphs, together 
with synthetic data including the Moon-Moser graphs as well as random graphs, we 
compare the performance of our implementation with the algorithm of Tomita et al. 
We also implement for comparison, a modified version of the Tomita et al. algorithm 
that uses adjacency lists in place of adjacency matrices, and a simplified version of the 
Eppstein-Loffler-Strash algorithm that represents its subproblems as lists of vertices 
instead of as dynamic graphs. Our results show that, for large sparse graphs, the new 
algorithm is as fast or faster than Tomita et al., and sometimes faster by very large fac- 
tors. For graphs that are not as sparse, the new algorithm is sometimes slower than the 
algorithm of Tomita et al., but remains within a small constant factor of its performance. 

2 Preliminaries 

We work with an undirected graph G = (V,E) with n vertices and m edges. For a vertex 
v, let r(v) be its neighborhood {w \ (v,w) € E}, and similarly for a subset W C V let 
r(W) be the set f] weW r(w), the common neighborhood of all vertices in W. 

2.1 Degeneracy 

Definition 1 (degeneracy). The degeneracy of a graph G is the smallest number d such 
that every subgraph of G contains a vertex of degree at most d. 



proc Tomita(P, R, X) 
1: if PUX = 0then 
2: report R as a maximal clique 
3: end if 

4: choose a pivot u £ PUX to maximize |Pnr(«)| 

5: for each vertex v e P\r(u) do 

6: Tomita(Pnr(v),^U{v},Xnr(v)) 

7: P-*-P\{v} 

8: XMU{v} 

9: end for 

Fig. 1. The Bron-Kerbosch algorithm with the pivoting strategy of Tomita et al. 

Every graph with degeneracy d has a degeneracy ordering, a linear ordering of the 
vertices such that each vertex has at most d neighbors later than it in the ordering. 
The degeneracy of a given graph and a degeneracy ordering of the graph can both be 
computed in linear time 1 6 1 . 

2.2 The Algorithm of Tomita et al. 

The algorithm of Tomita et al. ll45l is an implementation of Bron and Kerbosch's 
algorithm 1 8 1 , using a heuristic called pivoting 1 26 9 1 . The Bron-Kerbosch algorithm is 
a simple recursive algorithm that maintains three sets of vertices: a partial clique R, a set 
of candidates for clique expansion P, and a set of forbidden vertices X. In each recursive 
call, a vertex v from P is added to the partial clique R, and the sets of candidates for 
expansion and forbidden vertices are restricted to include only neighbors of v. If PUX 
becomes empty, the algorithm reports R as a maximal clique, but if P becomes empty 
while X is nonempty, the algorithm backtracks without reporting a clique. 

In the basic version of the algorithm, |P| recursive calls are made, one for each 
vertex in P. The pivoting heuristic reduces the number of recursive calls by choosing a 
vertex u in PUX called a pivot. All maximal cliques must contain a non-neighbor of u 
(counting u itself as a non-neighbor), and therefore, the recursive calls can be restricted 
to the intersection of P with the non-neighbors. 

The algorithm of Tomita et al. chooses the pivot so that u has the maximum number 
of neighbors in P, and therefore the minimum number of non-neighbors, among all 
possible pivots. Computing both the pivot and the vertex sets for the recursive calls can 
be done in time 0(\P\ ■ (|P| + \X\)) within each call to the algorithm, using an adjacency 
matrix to quickly test the adjacency of pairs of vertices. This pivoting strategy, together 
with this adjacency-matrix-based method for computing the pivots, leads to a worst- 
case time bound of 0(3"^) for listing all maximal cliques |45|. 

2.3 The Algorithm of Eppstein, Loffler, and Strash 

Eppstein, Loffler, and Strash [13] provide a different variant of the Bron-Kerbosch 
algorithm that obtains near-optimal worst-case time bounds for graphs with low degen- 
eracy. They first compute a degeneracy ordering of the graph; the outermost call in the 



proc Degeneracy(V, E) 
1: for each vertex v; in a degeneracy ordering vo, Vi, v% ... of (V,E) do 
2: / ) ^r(v ; )n{v,- +b ...,v„_ 1 } 
3: Z<-r(v ! -)n{v 0) ... ) V i _i} 
4: Tomita(P, {v,},X) 
5: end for 

Fig. 2. The algorithm of Eppstein, Loffler, and Strash. 



recursive algorithm selects the vertices v to be used in each recursive call, in this order, 
without pivoting. Then for each vertex v in the order, a call is made to the algorithm 
of Tomita et al. B31 to compute all cliques containing v and v's later neighbors, while 
avoiding v's earlier neighbors. The degeneracy ordering limits the size of P within these 
recursive calls to be at most d, the degeneracy of the graph. 

A simple strategy for determining the pivots in each call to the algorithm of Tomita 
et al., used as a subroutine within this algorithm, would be to loop over all possible 
pivots in X UP and, for each one, loop over its later neighbors in the degeneracy ordering 
to determine how many of them are in P. The same strategy can also be used to perform 
the neighbor intersection required for recursive calls. With the pivot selection and set 
intersection algorithms implemented in this way, the algorithm would have running 
time 0(d 2 n3 d /^), a factor of d larger than the worst-case output size, which is 0(d(n — 
d)3"^). 

However, Eppstein et al. provide a refinement of this algorithm that stores, at each 
level of the recursion, the subgraph of G with vertices in P U X and edges having at 
least one endpoint in P. Using this subgraph, they reduce the pivot computation time 
to |P|(|X| + \P\), and the neighborhood intersection for each recursive call to time 
|P| (\X\ + \P\), which reduces the total running time to 0(dn3 d/!3 ). This running time 
matches the worst-case output size of the problem whenever d < n — £2 («). As described 
by Eppstein et al., storing the subgraphs at each level of the recursion may require as 
much as O(dm) space. But as we show in Section 3.1 it is possible to achieve the same 
optimal running time with space overhead 0(n + m). 



2.4 Tomita et al. with Adjacency Lists 

In our experiments, we were only able to run the algorithm of Tomita et al. |45 1 on 
graphs of small to moderate size, due to its use of the adjacency matrix representation. 
In order to have a basis for comparison with this algorithm on larger graphs, we also 
implemented a simple variant of the algorithm which stores the input graph in an adja- 
cency list representation, and which performs the pivot computation by iterating over 
all vertices in PUX and testing all neighbors for membership in P. When a vertex v is 
added to R for a recursive call, we can intersect the neighborhood of r with P and X by 
iterating over its neighbors in the same way. 

Let A be the maximum degree of the given input graph; then the pivot computa- 
tion takes time ((94(|X| + |P|)). Additionally, preparing subsets for all recursive calls 
takes time (9(|P|4). Fitting these facts into the analysis of Tomita et al. gives us a 



0(A(n — A)3 A / 3 ) time algorithm. A may be significantly larger than the degeneracy, 
so this algorithm's theoretical time bounds are not as good as those of Tomita et al. or 
Eppstein et al.; nevertheless, the simplicity of this algorithm makes it competitive with 
the others for many problem instances. 



3 Implementation and experiments 

We implemented the algorithm of Tomita et al. using the adjacency matrix representa- 
tion, and the simple adjacency list representation for comparison. We also implemented 
three variants of the algorithm of Eppstein, Loffier, and Strash: one with no data struc- 
turing, using the fact that vertices have few later neighbors in the degeneracy ordering, 
an implementation of the dynamic graph data structure that only uses 0(m + n) extra 
space total, and an alternative implementation of the data structure based on bit vectors. 
The bit vector implementation executed no faster than the data structure implemen- 
tation, so we omit its experimental timings and any discussion of its implementation 
details. 



3.1 Implementation Details 

We maintain the sets of vertices P and X in a single array, which is passed between 
recursive calls. Initially, the array contains the elements of X followed by the elements 
of P. We keep a reverse lookup table, so that we can look up the index of a vertex 
in constant time. With this lookup table, we can tell whether a vertex is in P or X in 
constant time, by testing that its index is in the appropriate subarray. When a vertex 
v is added to R in preparation for a recursive call, we reorder the array. Vertices in 
-T(v) OX are moved to the end of the X subarray, and vertices in F(v) HP are moved 
to the beginning of the P subarray (see Figure |3J. We then make a recursive call on 
the subarray containing the vertices F(v) n (X UP). After the recursive call, we move 
v to X by swapping it to the beginning of the P subarray and moving the boundary so 
that v is in the X subarray. Of course, moving vertices between sets will affect P and X 
in higher recursive calls. Therefore, in a given recursive call, we maintain a list of the 
vertices that are moved from P to X, and move these vertices back to P when the call 
ends. 
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Fig. 3. When a vertex v is added to the partial clique R, its neighbors in P and X (highlighted in 
this example) are moved toward the dividing line in preparation for the next recursive call. 
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Fig. 4. For each vertex in P U X, we keep an array containing neighbors in P. We update these 
arrays whenever a vertex is moved from P to i?, and whenever we need to intersect a neighborhood 
with P and X for a recursive call. 



The pivot computation data structure is stored as set of arrays, one for each potential 
pivot vertex u in P U X, containing the neighbors of u in P. Whenever P changes, we 
reorder the elements in these arrays so that neighbors in P are stored first (see Figure|4]). 
Computing the pivot is as simple as iterating through each array until we encounter a 
neighbor that is not in P. This reordering procedure allows us to maintain one set of 
arrays throughout all recursive calls, requiring linear space total. Making a new copy of 
this data structure for each recursive call would require space 0(dm). 

3.2 Results 

We implemented all algorithms in the C programming language, and ran experiments 
on a Linux workstation running the 32-bit version of Ubuntu 10.10, with a 2.53 GHz 
Intel Core i5 M460 processor (with three cache levels of 128KB, 512KB, and 3,072KB 
respectively) and 2.6GB of memory. We compiled our code with version 4.4.5 of the 
gcc compiler with the -02 optimization flag. 

In our tables of results, "tomita" is the algorithm of Tomita et al., "maxdegree" is the 
simple implementation of Tomita et al.'s algorithm for adjacency lists, and "hybrid" and 
"degen" are the implementations of Eppstein, Loffler, and Strash with no data structure 
and with the linear space data structure, respectively. We provide the elapsed running 
running times (in seconds) for each of these algorithms; an asterisk indicates that the 
algorithm was unable to run on that problem instance due to time or space limitations. 
In addition, we list the number of vertices n, edges m, the degeneracy d, and the number 
of maximal cliques ji. 

Our primary experimental data consisted of four publicly-available databases of 
real-world networks, including non-electronic and electronic social networks as well as 
networks from bioinformatics applications. 

- A data base curated by Mark Newman fl40ll (Table [T| which consists primarily of 
social networks; it also includes word co-occurrence data and a biological neural 
network. Many of its graphs were too small for us to time our algorithms accurately, 
but our algorithm was faster than that of Tomita et al. on all four of the largest 
graphs; in one case it was faster by a factor of approximately 130. 



- The BioGRID data [44] (Table [2]) consists of several protein-protein interaction 
networks with from one to several thousand vertices, and varying sparsities. Our 
algorithm was significantly faster than that of Tomita et al. on the worm and fruitfly 
networks, and matched or came close to its performance on all the other networks, 
even the relatively dense yeast network. 

- We also tested six large social and bibliographic networks that appeared in the Pajek 
data set but were not in the other data sets [7 1 (Table|3]l. Our algorithm was consis- 
tently faster on these networks. Due to their large size, the algorithm of Tomita et 
al. was unable to run on two of these networks; nevertheless, our algorithm found 
all cliques quickly in these graphs. 

- We also tested a representative sample of graphs from the Stanford Large Network 
Dataset Collection |32| (Table |4). These included road networks, a co-purchasing 
network from Amazon.com data, social networks, email networks, a citation net- 
work, and two Web graphs. Nearly all of these input graphs were too large for the 
Tomita et al. algorithm to fit into memory. For graphs which are extremely sparse, 
it is no surprise that the maxdegree algorithm was faster than our algorithm, but 
our algorithm was consistently fast on each of these data sets, whereas the maxde- 
gree algorithm was orders of magnitude slower than our algorithm on the large 
soc-wiki-Talk network. 



Table 1. Experimental results for Mark Newman's data sets |40| 
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As a reference point, we also ran our experimental comparisons using the two sets 
of graphs that Tomita et al. used in their experiments. First, Tomita et al. used a data 
set from a DIMACS challenge, a collection of graphs that were intended as difficult 
examples for clique-finding algorithms, and that have been algorithmicaliy generated 
(Table [5j. And second, they generated graphs randomly with varying edge densities; 
in order to replicate their results we generated another set of random graphs with the 
same parameters (Table [6}. The algorithm of Eppstein, Loffler, and Strash runs about 



Table 2. Experimental results for BioGRID data sets (PPI Networks) (44). 
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Table 3. Experimental results for Pajek data sets |7 1. 
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Table 4. Experimental results for Stanford data sets |32|. 
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2 to 3 times slower than that of Tomita et al. on many of these graphs; this confirms 
that the algorithm is still competitive on graphs that are not sparse, in contrast to the 
competitors in Tomita et al.'s paper which ran 10 to 160 times slower on these input 
graphs. The largest of the random graphs in the second data set were generated with 
edge probabilities that made them significantly sparser than the rest of the set; for those 
graphs our algorithm outperformed that of Tomita et al by a factor that was as large 



as 30 on the sparsest of the graphs. The maxdegree algorithm was even faster than our 
algorithm in these cases, but it was significantly slower on other data. 



Table 5. Experimental results for Moon-Moser |37| and DIMACS benchmark graphs |22|. 
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4 Conclusion 

We have shown that the algorithm of Eppstein, Loffler, and Strash is a practical algo- 
rithm for large sparse graphs. This algorithm is highly competitive with the algorithm 
of Tomita et al. on sparse graphs, and within a small constant factor on other graphs. 
The advantage of this algorithm is that it requires only linear space for storing the graph 
and all data structures. It does not suffer from the drawback of requiring an adjacency 
matrix, which may not fit into memory. Its closest competitor in this respect, the Tomita 
et al. algorithm modified to use adjacency lists, is sometimes faster by a small factor 
but is also sometimes slower by a large factor. Thus, the algorithm of Eppstein et al. is 
a fast and reliable choice for listing maximal cliques, especially when the input graphs 
are large and sparse. 

For future work, it would be interesting to compare our results with those of other 
popular clique listing algorithms. We attempted to include results from Patric Ostergard's 
popular Cliquer program fiTI in our tables; however, at the time of writing, its newly 
implemented functionality for listing all maximal cliques returns incorrect results. 
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Table 6. Experimental results on random graphs. 



Graphs 
n 


P 


d 




tomita 


maxdegree 


hybrid 


degen 




0.6 


51 


59,898 


0.08 


0.26 


0.25 


0.14 




0.7 


59 


439,928 


0.50 


2.04 


1.85 


0.99 


100 


0.8 


70 


5,776,276 


6.29 


28.00 


24.86 


11.74 




0.9 


81 


240,998,654 


249.15 


1136.15 


1028.84 


425.85 




0.1 


21 


3,663 


< 0.01 


0.01 


0.01 


< 0.01 




0.2 


47 


18,911 


0.02 


0.07 


0.08 


0.05 




0.3 


74 


86,179 


0.10 


0.44 


0.49 


0.24 


300 


0.4 


101 


555,724 


0.70 


4.24 


3.97 


1.67 




0.5 


130 


4,151,668 


5.59 


42.37 


36.35 


13.05 




0.6 


162 


72,454,791 


101.35 


958.74 


755.86 


227.00 




0.1 


39 


15,311 


0.02 


0.03 


0.06 


0.04 




0.2 


81 


98,875 


0.11 


0.46 


0.61 


0.27 


500 


0.3 


127 


701,292 


0.86 


5.90 


6.10 


2.29 




0.5 


225 


103,686,974 


151.67 


1888.20 


1521.90 


375.23 




0.1 


56 


38,139 


0.04 


0.10 


0.19 


0.09 




0.2 


117 


321,245 


0.37 


2.01 


2.69 


1.00 


700 


0.3 


184 


3,107,208 


4.06 


36.13 


38.12 


11.47 




0.1 


82 


99,561 


0.11 


0.34 


0.70 


0.28 


1,000 


0.2 


172 


1,190,899 


1.45 


10.35 


14.48 


4.33 




0.3 


266 


15,671,489 


21.96 


262.64 


280.58 


66.05 


2,000 


0.1 


170 


750,991 


1.05 


5.18 


11.77 


3.13 


3,000 


0.1 


263 


2,886,628 


4.23 


27.51 


68.52 


13.62 




0.001 


7 


49,716 


1.19 


0.04 


0.07 


0.07 




0.003 


21 


141,865 


1.30 


0.11 


0.36 


0.26 


10,000 


0.005 


38 


215,477 


1.47 


0.25 


1.03 


0.51 




0.01 


80 


349,244 


2.20 


1.01 


5.71 


1.66 




0.03 


262 


3,733,699 


9.96 


20.66 


133.94 


20.67 
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